Search CORE

1 research outputs found

Animation of a speaking chatbot: The Deep Speaking Avatar project

Author: Jurvanen Konsta
Publication venue
Publication date: 25/01/2023
Field of study

This thesis is a mixture of scientific research related to pre-developed technologies for animating virtual avatars and the actual development of an own application for the purpose. The topic was restricted by compatibility with other components of a chatbot pipeline, available techniques for computer graphics, and the required user experience for a Deep Speaking Avatar application. Other components in the project’s pipeline were implemented by four other students as their own bachelor’s theses. Two other components of the pipeline are relevant to our application, one of which is face recognition guiding the rotation of the avatar’s head. The other component dependency was the output of the text-to-speech module that would time the lip movements of the avatar. Available techniques were first discovered for facial expression modeling and face-audio synchronization. Two of them were tested, but both proved to be too restricted to be animated and rotated in real-time. Both techniques were competent in their intended aspects but fulfilling all the requirements for the Deep Speaking Avatar required lowering the standards of the avatar’s realism and making an own implementation using general-purpose tools. A suitable amount of implementational freedom and technical abilities was offered by the popular 3D-rendering and development tool Unity. Another popular tool, Blender, was used to craft the 3D-mesh model that is animated and rendered in Unity. The main desired functionalities of the avatar were achieved successfully in the specific technical conditions used to test the implementation. The built avatar can turn its head approximately in the direction of the user’s face, and the lips of the avatar move at a monotonic pace when the avatar produces speech. Relative locations of the camera, user, and screen should not differ much from the used test setup. However, ideas to improve the application’s adaptivity to different physical setups were discussed. A Linux environment with the other Avatar-pipeline components is required to run the Deep Speaking Avatar, and an integration script, written by another student can be used for setting up the pipeline. A good amount of computational power is required to run the whole system because many of the chatbot modules utilize heavy neural networks

Trepo - Institutional Repository of Tampere University